Constructing a Norwegian Academic Wordlist

نویسندگان

  • Janne Bondi Johannessen
  • Arash Saidi
  • Kristin Hagen
چکیده

We present the development of a Norwegian Academic Wordlist (AKA list) for the Norwegian Bokmål variety. To identify specific academic vocabulary we developed a 100-million-word academic corpus based on the University of Oslo archive of digital publications. Other corpora were used for testing and developing general word lists. We tried two different methods, those of Carlund et al. (2012) and Gardner & Davies (2013), and compared them. The resulting list is presented on a web site, where the words can be inspected in different ways, and freely downloaded.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Signed Languages of Eastern Europe

1. Purpose and scope 2. General survey methodology 3. Qualitative information 3.1 Eastern Europe 3.2 Bulgaria 3.3 Czech Republic 3.4 Estonia 3.5 Hungary 3.6 Latvia 3.7 Lithuania 3.8 Moldova 3.9 Poland 3.10 Romania 3.11 Russia 3.12 Slovakia 3.13 Ukraine 3.14 Republics and provinces of the former Yugoslavia: Bosnia and Herzegovina, Croatia, Kosovo, Macedonia, Montenegro, Serbia, Slovenia, Voivodi...

متن کامل

The Development of a Temporal Information Dictionary for Social Media Analytics

Dictionaries have been used to analyse text even before the emergence of social media and the use of dictionaries for sentiment analysis there. While dictionaries have been used to understand the tonality of text, so far it has not been possible to automatically detect if the tonality refers to the present, past, or future. In this research, we develop a dictionary containing time-indicating wo...

متن کامل

Modeling and Encoding Traditional Wordlists for Machine Applications

This paper describes work being done on the modeling and encoding of a legacy resource, the traditional descriptive wordlist, in ways that make its data accessible to NLP applications. We describe an abstract model for traditional wordlist entries and then provide an instantiation of the model in RDF/XML which makes clear the relationship between our wordlist database and interlingua approaches...

متن کامل

Credibility: Norwegian Students Evaluate Media Studies Web Sites

This paper investigates Norwegian university students’ evaluations of web site credibility and site authors’ vested interests with respect to a textbased academic site and an informational site with commercial support. Credibility ratings were higher for some aspects of the academic site even though the non-academic sit was rated more highly in presentation design and currency. Negative correla...

متن کامل

Comparison of the South African Spondaic and CID W-1 wordlists for measuring speech recognition threshold

BACKGROUND The home language of most audiologists in South Africa is either English or Afrikaans, whereas most South Africans speak an African language as their home language. The use of an English wordlist, the South African Spondaic (SAS) wordlist, which is familiar to the English Second Language (ESL) population, was developed by the author for testing the speech recognition threshold (SRT) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016